Reconstructing Strings from Substrings in Rounds

نویسندگان

  • Dimitris Margaritis
  • Steven Skiena
چکیده

We establish a variety of combinatorial bounds on the tradeoos inherent in reconstructing strings using few rounds of a given number of substring queries per round. These results lead us to propose a new approach to sequencing by hybridization (SBH), which uses interaction to dramatically reduce the number of oligonucleotides used for de novo sequencing of large DNA fragments, while preserving the parallelism which is the primary advantage of SBH.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A BSP/CGM Algorithm for the All-Substrings Longest Common Subsequence Problem

Given two strings X and Y of lengths m and n, respectively, the all-substrings longest common subsequence (ALCS) problem obtains the lengths of the subsequences common to X and any substring of Y . The sequential algorithm takes O(mn) time and O(n) space. We present a parallel algorithm for ALCS on a coarse-grained multicomputer (BSP/CGM) model with p < p m processors that takes O(mn=p) time an...

متن کامل

Sequencing by Hybridization in Few Rounds

Sequencing by Hybridization (SBH) is a method for reconstructing an unknown DNA string based on substring queries: Using hybridization experiments, one can determine for each string in a given set of strings, whether the string appears in the target string, and use this information to reconstruct the target string. We study the problem when the queries are performed in rounds, where the queries...

متن کامل

PASS-JOIN: A Partition-based Method for Similarity Joins

As an essential operation in data cleaning, the similarity join has attracted considerable attention from the database community. In this paper, we study string similarity joins with edit-distance constraints, which find similar string pairs from two large sets of strings whose edit distance is within a given threshold. Existing algorithms are efficient either for short strings or for long stri...

متن کامل

An Efficient Algorithm for Finding Similar Short Substrings from Large Scale String Data

Finding similar substrings/substructures is a central task in analyzing huge amounts of string data such as genome sequences, web documents, log data, etc. In the sense of complexity theory, the existence of polynomial time algorithms for such problems is usually trivial since the number of substrings is bounded by the square of their lengths. However, straightforward algorithms do not work for...

متن کامل

Analysis of correlation structures in the Synechocystis PCC6803 genome

Transfer of nucleotide strings in the Synechocystis sp. PCC6803 genome is investigated to exhibit periodic and non-periodic correlation structures by using the recurrence plot method and the phase space reconstruction technique. The periodic correlation structures are generated by periodic transfer of several substrings in long periodic or non-periodic nucleotide strings embedded in the coding ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995